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Abstract 

The estimation of signal frequency count in the presence of back- 
ground noise has had much discussion in the recent physics literature, 
and Mandelkern [1] brings the central issues to the statistical com- 
munity, leading in turn to extensive discussion by statisticians. The 
primary focus however in [1] and the accompanying discussion is on 
the construction of a confidence interval. We argue that the likelihood 
function and p- value function provide a comprehensive presentation of 
the information available from the model and the data. This is illus- 
trated for Gaussian and Poisson models with lower bounds for the 
mean parameter. 

1 INTRODUCTION 

Mandelkern [1] brings to the statistical community a seemingly simple sta- 
tistical problem that arises in high energy physics; see for example, [2], [3]. 
The statistical model is quite elementary but the related inference problem 
has substantial scientific presence: as Pekka Sinervo, a coauthor of Abe et al. 
[2], [3] expresses, "High energy physicists have struggled with Bayesian and 
frequentist perspectives, with delays of several years in certain experimental 
programmes hanging in the balance" . 
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The problem discussed in [1] can be expressed simply. A variable y follows 
a distribution with mean 9 = b + /i, where b > is known, the shape of the 
distribution is known and the parameter fi > 0. The goal is to extract the 
evidence concerning the parameter fi, and in particular present the evidence 
on whether \i is zero or is greater than zero. In the physics setting y is often 
a count and is viewed as the sum of a count of y x background events and a 
count of y2 events from a possible signal. In [2] and [3], the signal records 
the presence of a possible top quark and the data come from the collider 
detector at Fermilab. The background count y± is modelled as Poisson(6) 
and the count from the possible signal as Poisson(ju). Following Mandelkern 
[1] we write y ~ Poisson(6 + /i) and let 9 = b + /i be the Poisson mean with 
the restriction 9 > b. There are additional aspects: for example the data are 
obtained as subsets of more complex counts, the background mean count b 
is estimated and so on, but we concentrate on the simpler problem here. We 
do however illustrate how the general case with b estimated from antecedent 
Poisson counts can be treated within the general theory. 

The Poisson case involves a discrete distribution and this introduces some 
minor complications that that are best treated separately from the essential 
inference aspects. Accordingly we include a discussion of the continuous case 
and for simplicity consider the normal distribution for y with mean 9 = b + ji 
and known standard deviation. 

Much statistical literature and most of the physics proposals cited by 
Mandelkern [1] are concerned with the construction of confidence bands for 
9 at some prescribed level of confidence. It is our view that this leads to 
procedures that are essentially decision-theoretic: we "accept" parameter 
values within the confidence interval and "reject" parameter values outside 
the interval; a 1/0 presentation. This accept/reject theory evolved from 
Neyman and Pearson [4], later generalized as decision theory by Wald [5]. 
The decision theoretic approach dominated statistical theory until the mid 
1950's, when Savage [6] promoted the personalistic Bayesian approach and 
Fisher [7] recommended an inference approach. Both these approaches make 
essential use of the likelihood function: the Bayesian approach combines this 
with prior information, and the inference approach emphasizes the use the 
likelihood function and the observed significance or p- value function. The p- 
value function is constructed using the model and observed data, as we shall 
describe in more detail below. One difficulty with the confidence interval 
approach arises from the presence of the lower bound b for the parameter 
space; if y is small then the confidence interval can be partly or completely 
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outside the permissible range [b, oo) for the parameter, making apparent 
nonsense of an assertion of 95% confidence. Various proposals have been 
put forward to modify the confidence approach to overcome such difficulties; 
the most prominent being the unified approach of Feldman and Cousins [8]. 
These proposals seek an algorithm for placing a 1/0 valuation on possible 
parameter values, in the framework of a prescribed confidence level. By 
contrast the p-value function promoted here provides essential evidence from 
the data concerning the value of the parameter; for some background see 
Fraser [9]. 

The discussants of Mandelkern [1] also focus on the confidence interval 
approach. An exception is Gleser [10], who suggests the use of the "likelihood 
function as a measure of evidence about the parameters of the model used 
to describe the data"; and Mandelkern [11] in his rejoinder concurs: "it may 
be most appropriate to, at least in ambiguous cases, give up the notion of 
characterizing experimental uncertainty with a confidence interval ... and to 
present the likelihood function for this purpose." But also Abe et al [3] report 
the p-value for the parameter value \x = 0; the p-value function extends this 
to all possible values of the parameter. The approach here recommends the 
joint presentation of the likelihood function and the p-value function as the 
evidence from the data concerning the parameter. 

In Section 2 we record some discussion of the unified approach and its 
variants, and also record various anomalies associated with their use. 

In Section 3 we expand on Mandelkern's comment and discuss what we 
call an inferential approach. This records the observed likelihood function 
and the observed p-value function. We feel that these present the full sta- 
tistical evidence concerning the parameter, and in turn allow appropriate 
individual judgments to be made concerning the parameter. An experiment 
reported in [3] is analysed using the Poisson model with background, first for 
known background and then allowing Poisson variation in the background. 

2 The unified approach and variants 

The construction of a confidence interval is often based on the theory of 
optimal testing, and this can lead to rather anomalous behavior. An opti- 
mality criterion typically involves averaging over the sample space, and in 
many situations there are what Fisher [7, 12] called 'recognizable subsets' 
of the sample space, subsets that appropriately partition the sample space. 



3 



In this setting the use of overall optimality can mean that intervals are con- 
structed which effectively trade performance in a single instance for average 
performance in a series of instances, most of which may have recognizably 
different features. In extreme cases this can give a confidence interval that 
is empty or a confidence interval that is the full range for the parameter: 
in such cases the overt confidence is clearly zero or 100% in contradiction 
to the prescribed or targetted confidence. For some recent discussions with 
examples, see Fraser [9] where the optimality criteria are shown to lead to 
decisions that are contrary to the available evidence; see also Cox [13] on the 
general appropriateness of optimality criteria. 

The conventional intervals applied to examples with a bounded parameter 
space also can lead to anomalous confidence intervals. Thus an optimum 
confidence interval derived for the unrestricted case may well lap into the 
inappropriate region 9 < b, this being the key issue in the Poisson case and 
mentioned for the continuous case in Mandelkern [1]. 

Various proposed modifications to the typical central confidence interval 
are discussed in Mandelkern [1]. Assume we have a scalar variable y with 
a continuous density f(y; 9) and with a distribution function F(y; 9) that is 
stochastically increasing in 9. Denote by yi(9) and yu{9) the 7 and 95% + 7 
quantiles of F(y; 9); these form a 95% confidence interval. Now let 7 = 7(6!) 
vary with 9 but be restricted to the interval (0,5%). The confidence belt in 
the y x #-space is the set union of the acceptance regions (yL(9),yu(6)) x 
and the ^/-section of the two dimensional confidence belt is a 95% confidence 
region and under moderate regularity will have the form #c/ (?/))• A 

reasonable objective is to have these sets stay within the acceptable range 
[b, 00) by some natural-seeming choice of the adjustment function 7(0). 

The likelihood ratio is used as one basis for deciding which points are to 
go into the acceptance interval yu(Q)) an d thus for determining 7(0). 

Then to form the acceptance interval the points are ordered from the largest 
using the ratio 

R =mA a) 

my) 

where L(9;y) = f(y;9) and 9 = 9{y) is a reference parameter value to be 
used with y. The Unified Approach of Feldman & Cousins [8] takes 9 = 9(y) 
to be the maximum likelihood estimate of 9 under the restriction 9 > b; 
for example in the Normal (0, 1) case, we have 9 = max(fe, y). The New 
Ordering approach of Giunti [14] takes 9 to be a Bayesian expected value 
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for 9. Using a somewhat different starting point Mandelkern & Schultz [15] 
obtain likelihood from the distribution of 0(y), which is a marginalisation 
from the distribution of y itself. For the normal case this 9 does not depend 
on y for y < b and not surprisingly the confidence intervals obtained by this 
approach are found not to depend on y for y < b; the resulting intervals had 
been considered earlier by Ciampolillo [16]. The use of these optimizing or 
ordering criteria can have rather strange effects. For, as noted, the criteria 
involve shifting the distribution bound to the left for low parameter values 
so that the 2.5% tail probabilities on the left and the right are changed to 
have less on the left and more on the right; this has the effect for small 
data values of shifting the confidence intervals to the right, away from the 
excluded parameter value range. The disturbing result however is that the 
lower confidence bound is no longer a 2.5% bound but something larger and 
perhaps undefined. And the upper confidence bound is no longer a 97.5% 
bound but something larger and perhaps undefined. Thus the individual 
bounds of the confidence interval do not have the direct statistical meaning 
that one would reasonably impute to them; this is particularly serious and 
disturbing in a context where the lower bound is directly addressing the 
issue of whether or not /i is equal to zero. These approaches seem to seek 
a single construction that combines the merits of one-sided and two-sided 
confidence intervals. In a sense this is treating both b and 9 as parameters 
and having the same construction provide conclusions about both of them. 
The inferential approach of the next section emphasizes the evidence in the 
data about the single parameter 9, with b fixed. The extension to the case 
of estimated background illustrated in Section 4 emphasizes the evidence in 
the data about 9, in the presence of a nuisance parameter. 

For the Poisson problem described in the introduction, Roe & Woodroofe 
[17] propose the use of certain conditional probabilities as the basis for the 
confidence belt construction following Feldman & Cousins [8]. Such condi- 
tioning had been proposed earlier for upper limits by Zech [18]. Roe and 
Woodroofe [17] recommended the use of the conditional distribution of y 
given yi < y°, say g(y,fi) = f(y\yi < y ', ^) as recorded (4) in Mandelkern 
[1]. But the variable yi is not an observable variable and hence not ancillary 
in the usual sense, and the proposed conditioning does not generate a parti- 
tion of the sample space. This was noted in Woodroofe & Wang [19] and in 
Cousins [20] , and a Bayesian approach was proposed in Roe and Woodroofe 
[21]. Thus the nominal conditional distribution does not satisfy the standard 
conditions for validity in describing conditional frequencies given observed 
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information. Also not surprisingly, as noted by Mandelkern [1] and Cousins 
[20], there is a related undercoverage which can be severe for the nominal 
confidence intervals constructed. 

3 The statistical evidence: the likelihood and 
£>-value functions 

Consider first a sample y — (y±, . . . , y n ) from the Normal (9, <Jq) distribution 
with <7q known. The likelihood function is proportional to the density for the 
sample mean at the observed value y°, and is examined as a function of the 
unknown 9: 

L(9)=c ( f>(n 1 / 2 (y°-9)/a ), (2) 

where is the standard normal density. The p-value function is the proba- 
bility that the sample mean is less than or equal to the observed y°: 

p(9)=$(n 1 / 2 (f-9)/a ), (3) 

where $ is the standard normal distribution function. The p-value function 
uses the known sampling distribution of y, and records the percentile position 
of the observed data in the distribution having parameter value 9. The more 
conventional interpretation of the p-value as "the probability of observing 
a result as or more extreme, under the model" is obtained as 1 minus the 
p-value function when the data is in the right tail of the distribution. Two- 
tailed p-values can also be obtained if desired. As a function of y, p{9) is 
uniformly distributed on (0, 1) under the assumed model. This "repeated 
sampling" property of the p-value is the analogue of coverage of a confidence 
interval. 

This discussion extends directly to any location model f(y — 9) for y. The 
likelihood function is 

n 

L(9)=L(9;y°,...,y n ) = c(y )llf(y°-9). (4) 

i=i 

And the p- value function, using the sampling distribution of y conditional on 
the observed sample configuration a = (y^ — y°, . . . , y® — y°), is 

p(9) = f f(y | a°)dy; (5) 
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in the special Gaussian case y is independent of a. This raises essentially 
no new problems beyond the computation of the integral. For this location 
model it can be shown that the p-value function is identical to the integral 
of the likelihood function, so that 

roo poo 

p(9) = / L{u)du/ / L(u)du; (6) 

thus the p-value function is identical to the posterior survivor function (one 
minus the posterior distribution function) using the flat prior n(9)d9 = d9. 

The location form of the model provides a procedure for simplifying the 
data vector (y±, . . . , y n ) to a scalar summary, y, by conditioning. As the 
distribution of the sample configuration is free of 9, no information is lost 
by this conditioning. The use of y as the one-dimensional variable is not 
essential; the same result is obtained using the maximum likelihood estimate, 
or in fact any location respecting estimator of 9, together with a notational 
change in the expression for a. In the methods for more general models this 
argument is applied using approximate conditioning and reexpression of the 
parameter to location form. 

We now return to the Poisson (9) with 9 = b + fj, where b is known 
and \x > 0. The Poisson case is simpler, in that the model specifies a one- 
dimensional variable, y, and a one-dimensional parameter 9, so no dimension 
reduction is needed. The likelihood function from an observed count y° is 

L(9) = c9 y \~ 6 (7) 

where 9 — b + /i. This can be plotted as a function of \i for /i in (—5, oo): 
for ji in [0, oo) it describes the probability at the observed data point under 
the assumed model; for fi in [—6, 0) it serves as a diagnostic concerning b, 
suggesting that either the model or the computation of b is not correct. The 
p- value function at y° is given by the interval 

p(9) = (F-( y ;9),F( y °;9)) (8) 

of numerical values, where F(y; 9) is the Poisson(#) distribution function 
and F~(y;9) is the probability up to, but not including, y and is given by 
F(y — 1;9). We use an interval of p- values in accord with the discreteness 
in the problem; a compromise is to plot the so-called mid p-value, which is 
F~~(y°; 9) + (l/2)/(y°; 9). In our approach an observed y° leads to a contin- 
uum of numerical p-values for each 9 being assessed. This proposal acknowl- 
edges the discreteness explicitly and yet does maintain the repeated sampling 
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property of the p- value function, that it have a uniform distribution on (0, 1). 
Other aspects of the discreteness problem are addressed in Brown et al. [22] 
and Baker [23]. 

As a simple example consider 6 = 2 with data y° = 3. The likelihood 
and p- value functions are recorded in Figure 1. The likelihood for p is easily 
understood, and particularly useful when combining data. The interpretation 
of a p-value for given data value is exactly analogous to the percentile score 
on, for example, a standardized test: it expresses the percentile position of 
the data point relative to the parameter. For the null condition 9 = 2 or 
p — the p- value interval for the data y° = 3 is (0.6770.857); a fairly central 
and broad range. 

If y° = 0, and b > 0, then p(p) = (0,exp{ — {b + //)}). This emphasizes 
the lack of information in the data about p, and this lack is most striking 
when p — and b is very small. For larger b, the observed value of will be 
further in the left tail of the p — distribution. 

In Abe et al. [3] after preliminary simplification from their Table 1 we 
have b = 6.7 with y° = 27. The likelihood function and p-value functions are 
plotted in Figure 2. For the null condition 9 = 6.7 or p = the data is in the 
extreme right tail and the upper and lower p- values are essentially 1. The 
actual values are (1 — 3 x 10~ 8 , 1 — 10~ 8 ) thus offering very strong evidence 
that p > 0. 

Figure 3 shows the corresponding likelihood and p-value plot for a Gaus- 
sian model, with p = b + 9, where n — 1, a — 0.5 and y = 1.8705 with b 
taken to be 1.4142. 

The Gaussian case is not as far removed from the Poisson as might be 
thought at first. If y ~Poisson(6 l ), then ^Jy is approximately distributed 
as Gaussian with mean y/6 and standard deviation 1/2, at least for large 
9. For comparison with the first example and Figure 1, the p- value interval 
for testing p = computed using the normal approximation with continuity 
correction, i.e. evaluated at \/(y ± 0.5), is (0.631, 0.819). 

It is possible to use recently developed work in likelihood asymptotics in 
the Poisson model with estimated background. We suppose that the back- 
ground mean count /3 is an unknown parameter estimated by b. To reflect 
the precision in this estimate, we write 

b = Vl /k (9) 

where y\ follows a Poisson distribution with mean kf3 and hence variance 
kf3. A value for the standard error of b, say cr b determines a value for k as 
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Figure 1: The likelihood function (top) and p- value function (bottom) for 
the Poisson model, with 6 = 2 and y° = 3. For /i = the p-value interval is 
(0.677,0.857). 
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Figure 2: The likelihood function (top) and p-value function (bottom) for 
the Poisson model, with b = 6.7 and y° = 27. For \x = the upper and lower 
p- values are essentially 1. 
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Figure 3: The likelihood function (top) and p-value function (bottom) for the 
normal approximation to the Poisson model, after square root transformation 
with data as in Figure 1. 
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k = hjo\. In [3] the estimated standard error from Table II is 2.1, with 
b = 6.7. The resulting p-value function is plotted in Figure 4, where it is 
compared with the mid p-value function assuming the background is known. 
The value of the new p-value function at /i — is 1 — 2.6 x 10~ 5 . 

4 Discussion 

The p- value function, evaluated at a particular value 9 , gives the percentile 
position of the observed data relative to the model with that parameter value 
9q. Our view is that the p- value provides the key scientific evidence in the 
data relative to the assumed model. In contrast a fixed level confidence 
approach provides a much more limited statement that the parameter is or is 
not contained in a given interval. An improvement to the confidence approach 
would be the reporting of confidence limits at a continuum of confidence 
levels, which is mathematically close to the p-value function approach. One 
can use the p- value function to construct a confidence interval at level 1 — a, 
by finding the parameter values for which the p- value equals, say, 1 — a/2 and 
a/2. However our definition of the p- value function is intrinsically one-sided, 
as seems more appropriate for the physical context of detecting a signal. 

It is important to know how the inferential approach promoted here gen- 
eralizes to more complex models. Most realistic models will have a parameter 
9 of dimension d, say. For this setting we might be interested in a scalar com- 
ponent ip{9) and could then want the p- value function for ip. If more than 
one component of 9 is of particular interest each could be examined in turn. 
The essential simplification available for this setting from recently developed 
likelihood theory is that to a high order of approximation there is a condi- 
tional model that behaves like a location model for ip with a related scalar 
variable that measures this parameter. Approximations to the corresponding 
observed likelihood function are given in Fraser [24] and approximations to 
the p- value function are given in Fraser, Reid and Wu [25]. These evolved 
from a closely related approach based on ancillarity due to Barndorff-Nielsen 
which is summarized in Barndorff-Nielsen and Cox [26]. The approach as 
described in [25] requires that y follow a continuous distribution in general 
models. Work in progress with A.C. Davison extends this approach to the 
discrete setting, and this work was used to derive the results summarized in 
Figure 4. 

The statistical literature summarizing higher order likelihood asymptotics 
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Figure 4: The p- value function using the third order approximation developed 
from [25], allowing for estimation errors in the background signal, compared 
with the mid p- value assuming the background is known. 
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is still fairly specialized, but some review or book length treatments are 
available in Reid [27], Severini [28], Skovgaard [29] and Barndorff-Nielsen 
and Cox [26]. 
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